Cross-language mapping for small-vocabulary ASR in under-resourced languages: investigating the impact of source language choice

نویسندگان

  • Anjana Sofia Vakil
  • Alexis Palmer
چکیده

For small-vocabulary applications, a mapped pronunciation lexicon can enable speech recognition in a target underresourced language using an out-of-the-box recognition engine for a high-resource source language. Existing algorithms for cross-language phoneme mapping enable the fully automatic creation of such lexicons using just a few minutes of audio, making speech-driven applications in any language feasible. What such methods have not considered is whether careful selection of the source language based on the linguistic properties of the target language can improve recognition accuracy; this paper reports on a preliminary exploration of this question. Results from a first case study seem to indicate that phonetic similarity between target and source language does not significantly impact accuracy, underscoring the languageindependence of such techniques.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analysis of Mismatched Transcriptions Generated by Humans and Machines for Under-Resourced Languages

When speech data with native transcriptions are scarce in an under-resourced language, automatic speech recognition (ASR) must be trained using other methods. Semi-supervised learning first labels the speech using ASR from other languages, then re-trains the ASR using the generated labels. Mismatched crowdsourcing asks crowd-workers unfamiliar with the language to transcribe it. In this paper, ...

متن کامل

A first LVCSR system for Luxembourgish, an under-resourced European language

Luxembourgish is embedded in a multilingual context on the divide between Romance and Germanic cultures and remains one of Europe’s under-described languages. We describe our efforts in building an large vocabulary ASR system for such a “minority” language (target language: Luxembourgish) without any transcribed audio training data. Instead, acoustic models are derived from major languages (sou...

متن کامل

Investigating the Effect of Morphology Instruction through Semantic Map-ping on Vocabulary Learning of Iranian Intermediate EFL Learners

  The aim of this study was to investigate the effect of morphology instruction through semantic mapping on vocabulary learning of Iranian intermediate EFL learners. To do so, 50 out of 70 students were se-lected from one English language institute by administrating a PET test. Then, they were assigned into two groups randomly as experimental and control groups. A pretest (teacher made) was adm...

متن کامل

SMT-based ASR domain adaptation methods for under-resourced languages: Application to Romanian

This study investigates the possibility of using statistical machine translation to create domainspecific language resources. We propose a methodology that aims to create a domain-specific automatic speech recognition (ASR) system for a low-resourced language when in-domain text corpora are available only in a high-resourced language. Several translation scenarios (both unsupervised and semi-su...

متن کامل

Speed Perturbation and Vowel Duration Modeling for ASR in Hausa and Wolof Languages

Automatic Speech Recognition (ASR) for (under-resourced) Sub-Saharan African languages faces several challenges: small amount of transcribed speech, written language normalization issues, few text resources available for language modeling, as well as specific features (tones, morphology, etc.) that need to be taken into account seriously to optimize ASR performance. This paper tries to address ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014